Blog

Is Infrastructure as Code the Key to Moving DevOps Beyond ClickOps?

Transforming DevOps With Infrastructure As Code

Apr 23, 2024

Even in the cloud-native world, we can’t avoid dealing with infrastructure. What's worse, approaches such as microservices mean that some amount of responsibility for infrastructure is shifting to the project team. In this article, we’ll show that we as developers shouldn’t be afraid of infrastructure. Quite the opposite, with infrastructure as code, we can reuse much of our existing knowledge and put it to good use.

STAY TUNED

Learn more about DevOpsCon

 

Clicking together infrastructure via the cloud provider’s web interface, aptly known as ClickOps, is rarely a viable option. Handwritten shell scripts often leave much to be desired. There is a lack of traceability and more importantly reusability. Even if the manual creation of infrastructure in the cloud is almost playfully simple, this approach is unsuitable for productive use, especially in groups; however, ClickOps certainly has its raison d’être for learning purposes.

Why Infrastructure as Code (IaC)?

Both automation and declarative APIs are generally regarded as pillars of cloud-native. If we apply these principles to infrastructure, the result is Infrastructure as Code (IaC). As the name suggests, we describe the infrastructure we need in the form of code, which isn’t unlike what we do every day with Java, for example. The exact form of the code naturally depends on the IaC tool of choice. Once the infrastructure has been described, the tool takes on the tedious task of actually creating it in the cloud (or elsewhere). Even when inevitable changes are made to the infrastructure, e.g. if we want an additional machine or perhaps need more memory for our database, it’s adapted conscientiously. This approach gives us much better traceability. We can review and version the code as usual and in addition, reusability is high. With the description of the infrastructure, we can also trivially create several copies of it (with slight adjustments if necessary) and thus ensure identical DEV and PROD environments, for example.

 

Why is this relevant for developers?

While infrastructure as code certainly plays an important role in the context of cloud-native, the question quickly arises as to why the topic of infrastructure is particularly relevant for us as developers. The answer to this question can be derived from cloud-native principles. Modular applications in the form of microservices are a cornerstone of the cloud-native approach. The various microservices are often developed by different teams. Each microservice requires a certain amount of infrastructure so that it can be provided. Even if a large part of this work is often carried out by a platform team in many larger companies, the project teams can rarely be completely shielded from the infrastructure issue; this is even more applicable to smaller companies.

Particularly when a team wants to rely on provider-specific cloud services, dealing with the topic of (cloud) infrastructure is unavoidable. As soon as the “golden path” defined by the platform team is abandoned, we as developers often have to get our own hands dirty. Of course, this is nothing new and has already become established in teams through mindsets such as DevOps. However, with cloud-native in particular, the focus is shifting even more towards infrastructure in the cloud.

At the same time, however, we as developers should be pleased that Infrastructure as Code is becoming increasingly important, especially for the cloud. After all, code is our daily bread. At the same time, we benefit from the advantages of the cloud. For example, many cloud services can be operated with significantly less effort and expertise than traditional infrastructure in our own data center. As developers, we can therefore manage the infrastructure for our services much more independently and confidently.

For infrastructure tasks in the low-level area, such as network topologies, we can usually refer to the platform team or to IT operations in the traditional way. For developers, the high-level areas such as databases, caches or serverless functions are much more relevant. Experience has shown that the latter is precisely the level that interests us developers. The low-level level is often rightly taken for granted as part of the platform.

Spoiled for choice

Now we can talk about the necessity of IaC at length, but in the end, we have to decide which tool to work with. First of all, it can be said that a lot of tools and technologies can be assigned to the IaC spectrum. For example, Kubernetes manifests (also an important topic in the context of cloud native) are certainly also code that can be used to describe infrastructure (especially when using Kubernetes operators). Tools such as Ansible, Chef, Puppet and others from the area of configuration management also have major overlaps with IaC. However, we will concentrate on the tools that are mentioned in the same breath as IaC, even if these in turn may have points of contact with Kubernetes or configuration management. We try to classify the various IaC tools into partially overlapping categories and evaluate them as objectively as possible. Depending on the given situation, each tool certainly has its raison d’être.

Terraform

Anyone who is even superficially involved with IaC will hardly be able to avoid Terraform [1]. Terraform is to a certain extent the top dog in the IaC sector. And not without good reason, as Terraform supports almost every cloud provider. In addition to so-called Terraform providers for Amazon Web Services (AWS), Google Cloud Platform (GCP) and Azure, there is also support for small regional providers such as the Open Telekom Cloud and Hetzner; specialized offerings such as services from Cloudflare can also be easily managed via Terraform.

The infrastructure is described in Terraform in its own language, known as HashiCorp Configuration Language (HCL for short). It’s therefore necessary to learn a new language to use Terraform. The good news is that the language is extremely simple and largely tailored to the IaC environment. Listing 1 shows a Terraform module that contains a Postgres database in AWS. The resulting address of the database is marked as output.

Listing 1

resource "aws_db_instance" "db" {
identifier = "terraform-db"
engine = "postgres"
engine_version = "15.3"
allocated_storage = 20
instance_class = "db.t4g.micro"
username = "cloud_user"
password = "cloud_password"
publicly_accessible = true
skip_final_snapshot = true
}
 
output "db_address" {
value = aws_db_instance.db.address
}

Terraform is also able to describe infrastructure that includes and combines different cloud providers. Contrary to what is often assumed, however, Terraform does not initially represent an abstraction of different clouds. To describe a database in Terraform, an explicit decision must be made for which cloud this is to be done. For example, in AWS we describe that we need an RDS instance, in Azure an Azure SQL database and in GCP a Cloud SQL instance. To use Terraform to describe infrastructure in AWS, for example, the corresponding AWS knowledge is also required.

Even more complex infrastructures can be easily tamed in Terraform, as the option of modularization is provided. In particular, recurring constructs can be encapsulated in reusable modules and thus also made available to other teams. It is therefore quite conceivable that a platform team could provide ready-made modules as part of a golden path to cast typical infrastructure requirements into code and thus make them easily usable for individual project teams. Listing 2 shows the use of such a reusable microservices module. Internally, this could lead to the creation of a load balancer, two Docker containers and a database, for example.

Listing 2

module "microservice" {
source = "./modules/microservice"
image_name = "address-service"
image_tag = "2.1.0"
instance_count = 2
db_name = "address-db"
}
 
output "service_url" {
value = module.microservice.service_url
}

At the same time, the use of its own HCL language ensures that the description of the infrastructure remains comprehensible, as the programmability is deliberately limited and restricted to predefined functions and simple loop constructions. After describing the first complex infrastructure, you quickly learn to appreciate the level of abstraction chosen by Terraform. If you encounter certain limitations, it is often because you are overthinking it. Infrastructure should be kept comprehensible and as simple as possible, Terraform often, if not always, automatically takes care of this. Terraform is a particularly good choice when different cloud providers are combined and no complex abstractions are required. It is also a good choice for more complex infrastructure. As Terraform is probably the most widely used tool, the large amount of available documentation also speaks in favor of the tool.

 

Cloud-specific tools

The three major cloud providers all offer their own languages and tools for describing infrastructure. Namely CloudFormation at AWS, Azure Resource Manager at Azure and Google Cloud Deployment Manager at GCP. This is a clear sign that the cloud providers themselves consider Infrastructure as Code to be an important means of using the cloud. Listing 3 shows a simple cloud formation example that can be used to manage a Postgres database in AWS.

Listing 3

Resources:
DB:
Type: AWS::RDS::DBInstance
Properties:
Engine: postgres
EngineVersion: 15.3
AllocatedStorage: 20
DBInstanceClass: db.t4g.micro
MasterUsername: cloud_user
MasterUserPassword: cloud_password
PubliclyAccessible: true
DeletionProtection: false
Outputs:
DBAddress:
Value: !GetAtt DB.Endpoint.Address

Provider tools are significantly less powerful than other IaC tools. Instead of relying on their own language, the infrastructure is described based on JSON and YAML. Simple operations (such as the dynamic assembly of a string) can be programmed in JSON or YAML, which honestly feels as strange as it sounds. While modularization is certainly possible in provider-specific tools, the whole thing often feels a little clunky in practice. Complex infrastructures in particular are difficult to describe in JSON or YAML above a certain size – despite modularization – especially if the description inevitably has to be adapted in the future and this is made more difficult due to the clarity. Nevertheless, these tools also have certain advantages. They are often the built-in way to describe infrastructure via code. As they come directly from the manufacturer, they are certainly also considered to be more compatible with the respective provider. The description can often be carried out directly via the provider’s web interface without installing a tool; this is not easily possible with third-party tools such as Terraform.

One disadvantage is that it’s much more difficult to apply the knowledge learned when using a cloud-specific tool to other tools. However, if it is clear that only a specific cloud is suitable, this doesn’t necessarily have to be a problem. Azure Bicep also deserves a mention. A tool provided by Azure that can be used to describe infrastructure specifically in Azure using a specially developed language. Roughly speaking, Bicep can be described as a kind of terraform for Azure and can certainly offer advantages over the direct use of the Azure Resource Manager. Such tools can be a good choice if the infrastructure is less complex and only one cloud provider is relevant. Even if you want to profit from official documentation or even support, tools such as CloudFormation and the like can be the right choice.

Pulumi and CDKs

The Pulumi tool [2] and numerous other tools that can be summarized under the term CDK (Cloud Development Kit) take a different approach; the latter have become particularly well-known thanks to the AWS CDK. Instead of JSON/YAML or a separate (declarative) language, well-known programming languages such as Java, Python or TypeScript can be used to describe the infrastructure. This approach lives up to the word code in Infrastructure as Code. Listing 4 shows a Pulumi example that uses TypeScript to describe a Postgres database in AWS.

Listing 4

import * as aws from "@pulumi/aws";
 
const db = new aws.rds.Instance("db", {
identifier: "pulumi-db",
engine: "postgres",
engineVersion: "15.3",
allocatedStorage: 20,
instanceClass: aws.rds.InstanceType.T4G_Micro,
username: "cloud_user",
password: "cloud_password",
publiclyAccessible: true,
skipFinalSnapshot: true,
});
 
export const dbAddress = db.address; 

Unlike handwritten scripts that access the cloud programmatically via API, the result is still a declarative description of the infrastructure. When using the AWS CDK, the result is a CloudFormation template in the form of a JSON file – even if it remains largely hidden from the user of the AWS CDK. IaC tools of this type at least mentally compile high-level code into a comparatively simple format that can then be used to manage the infrastructure in the cloud. The approach of virtually programming the infrastructure description promises a number of advantages. For us developers in particular, it often means that we don’t have to learn a new language. Instead, we can stay in our well-protected Java world – definitely one less hurdle. At the same time, we can profit from the existing ecosystem of the language, be it through existing libraries or through tools such as our IDE. Nevertheless, the free use of higher programming languages also harbors a danger. The great flexibility can quickly lead to incomprehensible infrastructure, especially if too many abstractions are introduced. As developers, we quickly tend to apply our usual thought patterns as soon as we are in our element. In contrast to normal software development, the description of infrastructure tends to rely much more on simple and repetitive code. Abstracting infrastructure parts is a powerful approach, but it should only be used with caution if it also brings great added value.

Programmable infrastructure description is definitely a trend. It is now also possible to use existing Terraform providers with the help of the Terraform CDK without having to learn a new language. There is now also a CDK representative in the Kubernetes area with cdk8s. As all CDKs are based specifically on the concepts of the AWS CDK, you quickly feel at home when jumping from CDK to CDK. It makes sense to rely on tools such as Pulumi and the like when a high degree of modularization and, in particular, abstractions are required, especially when other tools reach their limits. If this is not necessary, you quickly run the risk of exploiting the great freedom of higher programming languages and thus creating incomprehensible infrastructure descriptions.

Trial and error as an important (learning) procedure

Now the question arises: How do you actually learn Cloud? It is certainly worth taking the first steps in ClickOps style via the web interface. That way, you can at least get a good feel for what services are available and how they roughly work. However, as mentioned at the beginning, Infrastructure as Code is the way to manage infrastructure productively. There are various ways to learn a complex topic. While trial and error does not have a good reputation as a method, it is a suitable way to learn the cloud and to build up complex cloud infrastructure step by step later on. Infrastructure as Code is fortunately made for trial and error, as infrastructure can not only be created and modified at the “push of a button”, but can also be completely removed again. Coupled with the fact that most cloud providers charge according to demand, it is practical to create infrastructure for free and then destroy it again after a few moments. In addition, many IaC tools show what impact the changed infrastructure description has on the actual infrastructure in the cloud and whether these changes in the form are possible at all. Listing 5 shows such an output from Terraform. And even if, in rare cases, errors occur when the infrastructure is actually changed, the infrastructure description can be subsequently adapted. If in doubt, simply start again from the beginning.

Listing 5

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
+ create
 
Terraform will perform the following actions:
 
# aws_db_instance.db will be created
+ resource "aws_db_instance" "db" { 
+ address = (known after apply)
+ engine = "postgres"
+ engine_version = "15.3"
+ identifier = "terraform-db"
+ password = (sensitive value)
+ username = "cloud_user"
...
}
 
Plan: 1 to add, 0 to change, 0 to destroy.
...

While the infrastructure for the PROD stage is certainly a little more cautious, this approach can be applied relatively safely even in the DEV stage to identify potential problems at an early stage and avoid them in later stages. In practice, it has also become established to separate conditional and stateless infrastructure. For example, you would certainly be more careful with an infrastructure description that contains a database than one that only contains machines with stateless applications. The latter can be deleted and recreated without causing permanent disruption to operations – apart from a small amount of downtime.

Not just for the cloud

It’s not just the cloud that plays a major role in the cloud-native sector. Kubernetes is also an important part of the practice, i.e. the part that we as developers are often confronted with. In fact, IaC tools such as Terraform and Pulumi are suitable tools for managing deployments on a Kubernetes cluster. This also makes these tools much more relevant for us developers.

Terraform and Pulumi are particularly brilliant when infrastructure in the cloud is connected with resources in Kubernetes, thus creating a description that not only includes the infrastructure but also the deployments operated on it. The added value quickly becomes clear if the deployment requires information that is only known when the infrastructure is created. For example, a Postgres database could be created in the cloud in the infrastructure description and the required address information could be stored in the cluster as a Kubernetes config map in the same breath so that the deployed application can access the database. Listing 6 shows a simplified example of this. The IaC tool can serve as a kind of bridge between infrastructure and deployment. There are numerous use cases for this.

Listing 6

resource "aws_db_instance" "db" {
identifier = "terraform-db"
engine = "postgres"
engine_version = "15.3"
...
}
 
resource "kubernetes_config_map" "db_address" {
metadata {
name = "db-address"
}
data = {
db_address = aws_db_instance.db.address
}
}

Familiar tooling and procedure

Cloud is certainly a new world for many and IaC also brings with it one or two innovations. Fortunately, however, much of what we as developers already do on a daily basis can be applied to Infrastructure as Code – mainly thanks to the fact that it is code. Ideally, we should treat infrastructure code in the same way as normal code. This means that it is stored in a version control system such as Git and, like normal code, also undergoes a review process to uncover potential problems and quality defects at an early stage. High-quality standards should also apply to this type of code, which are checked and adhered to using automated tools such as linters and formatters. Unit tests and, above all, integration tests are also suitable means of ensuring greater confidence when working with the cloud in the area of infrastructure. Integration tests in particular, which actually create the infrastructure in the cloud temporarily and then remove it again, can be worth their weight in gold when it comes to avoiding nasty surprises. In general, many practices from software development can be applied more or less directly to the area of infrastructure as code. In this respect, we as developers also have a certain advantage, as these practices are rarely the order of the day in the purely operational area, unlike here.

IaC in the CI/CD pipeline

As developers, we are now used to a certain degree of automation thanks to our CI/CD pipelines. We push our changes to the main branch and the latest version of our customized application is already running on the DEV stage. Deployment to the PROD stage is now also often just a tagged commit or a push to a special branch away. It is therefore not surprising that we also tend to release our infrastructure description to the cloud as part of the CI/CD pipeline (e.g. when pushing to the main branch). The first reflex is of course not wrong, but in practice, opinions differ on the extent to which infrastructure changes should be automated.

Unlike the deployment of modern applications, rolling back infrastructure in the event of errors can be problematic. If it is possible to roll back to the previous infrastructure status, this is not an automatic process in every tool. At the same time, the trial and error procedure is significantly less efficient if the infrastructure change is only carried out during the push. A much smaller feedback loop is often advisable here. It is not uncommon for even IaC tools to still be run by the developer on their own machine. In order to integrate IaC tools into the CI/CD pipeline in a meaningful way, there should be a distinct review policy in which the planned changes to the infrastructure are clearly visible, e.g. as an automated comment with the output of the terraform plan in a merge request. If the changes are accepted, the infrastructure can be adapted. Tools such as Atlantis support the implementation of such a workflow. To avoid errors even after a review, it is advisable for developers to first test the infrastructure description in a temporary environment. Here, changes can be planned in peace and quiet so that a functional infrastructure description is created. Ideally, this should be as close as possible to the DEV and PROD stages (and everything in between) in order to minimize the risk when executing against the productive infrastructure.

Conclusion 

Even in the age of cloud-native, we cannot completely avoid the topic of infrastructure. Especially with the increasing responsibility in largely self-sufficient teams, the management of infrastructure often falls into the hands of developers. Fortunately, there are a few aspects that play into our hands.

With Infrastructure as Code as a best practice, we can reuse much of the experience we have gained in development, even if we certainly have to deviate a little from established thought patterns here and there. Many of the typical challenges in the area of infrastructure are also much easier to overcome thanks to the cloud. We barely have to think about issues such as hardware any more, and the cloud also largely relieves us of setting up fail-safe systems, such as relational databases, with managed services. Of course, there are still some areas in which the expertise of operations and perhaps a platform team is required; we have deliberately excluded the entire area of monitoring here, for example. Nevertheless, there’s no better time for developers to learn about infrastructure than right now and to profitably incorporate the knowledge they have gained into their own team.

Links & Literature

[1] https://www.terraform.io

[2] https://www.pulumi.com

[3] https://www.runatlantis.io

Stay tuned:

Behind the Tracks

 

Kubernetes Ecosystem

Docker, Kubernetes & Co

Microservices & Software Architecture

Maximize development productivity

Continuous Delivery & Automation

Build, test and deploy agile

Cloud Platforms & Serverless

Cloud-based & native apps

Monitoring, Traceability & Diagnostics

Handle the complexity of microservices applications

Security

DevSecOps for safer applications

Business & Company Culture

Radically optimize IT

Organizational Change

Overcome obstacles on the road to DevOps

Live Demo #slideless

Showing how technology really works